CHEP03, Ja Jolla, California, March 24-28, 2003 



1 



GRAPPA: Grid Access Portal for Physics Applications 

Daniel Engh and Robert Gardner 

University of Chicago, Chicago, IL 

Shava Smallen 

SDSC, San Diego, CA, 

Jerry Gieraltowski 

Argonne National Lab, Chicago, IL 

Liang Fang, Dennis Gannon, and Randy Bramley 

Indiana University, Bloomington, IN 



o 
o 

(N 



Grappa is a Grid portal effort designed to provide physicists convenient access to Grid tools and services. The 
ATLAS analysis and control framework, Athena, was used as the target application. Grappa provides basic 
Grid functionality such as resource configuration, credential testing, job submission, job monitoring, results 
monitoring, and preliminary integration with the ATLAS replica catalog system, MAGDA. Grappa uses Jython 
to combine the ease of scripting with the power of java-based toolkits. This provides a powerful framework for 
accessing diverse Grid resources with uniform interfaces. The initial prototype system was based on the XCAT 
Science Portal developed at the Indiana University Extreme Computing Lab and was demonstrated by running 
Monte Carlo production on the U.S. ATLAS test-bed. The portal also communicated with a European resource 
broker on WorldGrid as part of the joint iVDGL-DataTAG interoperability project for the IST2002 and SC2002 
demonstrations. The current prototype replaces the XCAT Science Portal with an xbooks jetspeed portlet for 
managing user scripts. 
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1. Introduction 

As the computational demands of High-Energy 
Physics (HEP) rises, deploying physics analyses across 
the computational "Grid" [llQl promises to meet much 
of this demand. However, Grid-enabling physics ap- 
plications adds the complexity of using a wide array 
of computing systems that differ in their specific con- 
figurations and usage policies. We have developed 
Grappa: Grid portal access for physics applications^^ , 
a Grid portal that provides a user-friendly approach 
for performing tasks on the Grid such as: submitting 
and monitoring jobs, monitoring system performance, 
and browsing results. 

CPUs, Data Storage, software libraries are dis- 
tributed across the Grid. A Grid portal, accessible 
from any browser, consolidates access to these re- 
sources with uniform interfaces for the end-user who 
can then think in higher-level terms such as total com- 
putational need or data sets rather than individual 
CPUs or files, for example. 

Physicists are typically familiar with HTML for pre- 
senting static information, but they are typically less 
familiar with methods to develop portals with more 
advanced functionality. Grappa provides to the physi- 
cist a modern, user-friendly framework for construct- 
ing web portals with HTML forms and commonly un- 
derstood scripting languages. 

The current Grappa prototype uses the scripting 
language JythonH with the Java Commodity Grid 
(Java CoG)0, providing Grid tools. With pre- 
configured Jython wrapper functions to commonly 
used Java tools, a physicist-user can readily develop 
customized portals with a minimum of Java program- 
ming experience. 



2. The Portal Framework 

Grappa has undergone several revisions since the 
initial prototypes began in the fall of 2001. The 
initial Grappa portal was based on the XCATSP0 
with demonstrations of several successive notebooks 
(and xbooks) for ATLAS^] job submission. Other 
groups have developed portals with features similar 
to Grappa, but portals based on differing underlying 
frameworks have incompatible components, hindering 
the ability to interchange these portal components. 
Jetspeed 9J has been widely adopted as a standard 
framework for portal components (portlets), allowing 
the development of portlets with generic functionality 
that can be shared among many different specialties 
(physics, chemistry, etc.). Several portlets have been 
developed by the Indiana University Extreme Com- 
puting Lab[l3| and incorporated into Grappa. 

The current Grappa prototype uses four compo- 
nent layers: Tomcat, Jetspeed, Xbooks, and the user's 
xbook which consists of scripts and HTML forms. Our 
Grid portal architecture uses Java "Portlet" technol- 
ogy to support secure interaction with Globus and 
Grid web Services. Each portlet defines a window 
on a Grid service or instances of a Grid application 
using the Java CoG kit. Users can customize their 
own portal layouts by choosing and organizing differ- 
ent portlets. Grappa is a deployment of portlet combi- 
nations including the xbooks portlet, tailored to some 
of the specific needs for HEP data analyses. 



2.1. Xbooks 

Xbooks ^3 provides the engine for interpreting the 
xbook scripts, which in our case use Jython. To use 
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Figure 1: Grappa job submission schematic. A web or command- line interface communicates with the portal which 
interacts with grid resources. 



xbooks, a portal (e.g. Grappa) is configured with an 
xbooks portlet. This portlet contacts an xbooks di- 
rectory service and the user can choose a particular 
xbook to run. The xbook portlet finds an xbook server 
that hosts the xbook, and this server sends the port- 
let a HTML form. The user- filled form values are sent 
to the xbooks manager which archives them and uses 
them to configure and launch the application. The 
archival of the job input and submission values allows 
the user to review, monitor, and even re-submit the 
jobs. 

2.2. Scripting 

Xbooks supports multiple scripting languages such 
as Python, Jython, and Perl. Grappa uses Jython 
to leverage the availability of Java toolkits. Jython 
wrapper functions to Java CoG tools provide easy 
scripting interfaces for gsiftp file transfer, GRAM job 
submission and GRAM job monitoring. The xbook- 
developer creates HTML form interfaces for entering 
information and scripts for processing this informa- 
tion. These scripts are split into 2 groups: scripts 
that run on the portal host, and scripts that run on 
the compute host. 

Before jobs are submitted onto a Grid-enabled clus- 
ter, Grappa makes sure an updated cache of Grappa 
tools is available on that cluster. These tools are pack- 
aged in a jar file that includes the Java CoG kit and a 
core set of Grappa Jython scripts. Grappa then sub- 
mits jobs to the Globus job-manager available for that 
system. When these jobs run, they then have access 
to the Grappa-installed Java tools which include the 
Java CoG Globus client tools for gsiftp file transfer, 
etc. 



Jobs can be configured to search for locally installed 
libraries or to temporarily install these libraries if 
needed, in our case the ATLAS Athena libraries. Ap- 
plication files, input data, and executables are trans- 
ferred from the Grid location specified on the web 
form. The job executes, transfers results to the re- 
sults location, and finally cleans up after itself. 



3. Prototypes 



The basic Grappa prototype is illustrated in 
Figure n The user interacts with the portal via 
web browser or command-line interfaces. The portal 
scripts use Java CoG to submit to grid resources. The 
portal can monitor GRAM information for each job 
and the web interface can be used to monitor portal 
job 'side-effects' such as its MAGDA file registration. 

We have developed and tested several Grappa 
prototypes. The initial prototypes, based on the 
XCATSP demonstrated resource management, job 
submission, job monitoring, and browsing of results. 
These features were demonstrated in ATLAS Monte 
Carlo data challenges on U.S. ATLAS test-beds. The 
current prototype replaces XCATSP with Jetspeed 
portlets, but uses essentially the same Jython scripts 
and HTML forms from previous versions. With 
the exception of the resource management forms, 
the Grappa scripts have been successfully ported to 
xbooks and the new Jetspeed-based portal demon- 
strates significantly improved fiexibility in portal con- 
figuration. 
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3.1. Resource Management 

Information on the cluster, Grid, user, and appli- 
cation installations for a compute resource is needed 
to submit jobs on that resource. The current model 
for managing this data in Grappa is to enter this in- 
formation on a portal web form, and this informa- 
tion is then stored in a portal database. Compute- 
cluster information stored in the database includes 
the operating system and Java versions, the number 
of processors on that cluster, and firewall informa- 
tion. Grid information for a resource includes the 
type of job manager, the Globus job-manager contact 
string (as specified in the Globus GRAM specifica- 
tion), and the gsiftp or gridftp server contact strings. 
Application-specific information, such as the location 
of application libraries, may also be entered in the 
resource database. Application-specific information, 
however, is optional for the resource database since 
the portal is designed with the flexibility to be able to 
run applications on generic systems that do not have 
any application-specific software pre-installed. An im- 
proved model for managing this data would be to dy- 
namically access resource information via MDS[l3. 

The user submits a second resource management 
web form to test the availability of selected resources. 
Grappa uses Globus authentication to restrict portal 
access and provide access by proxy to grid resources. 
The initial model used credentials obtained from the 
.globus directory of the user that instantiated the por- 
tal. A future model to use a Jetspeed portlet to man- 
age proxies obtained from a MyProxy Server has also 
been demonstrated. 

3.2. Job Submission 

The user specifies additional information on a job 
submission web form. This information includes loca- 
tions for input and output data, user-specific applica- 
tions and libraries, and (for example) collaboration- 
certified library packages. The user enters these lo- 
cations with the Uniform Resource Identifier (URI) 
format (protocol/host /path) which generalizes paths 
to the Grid level, so these locations can be scattered 
across the Grid while the user thinks in terms of a 
single form. The portal scripts use this information 
to dynamically construct a working environment on 
generic Grid-enabled compute sites. 

Additional application-specific information, such as 
the number of events and the physics model to simu- 
late are entered and passed as arguments from the por- 
tal to the application scripts (e.g. jobOptions files). 

Rather than submitting jobs to specific sites, the 
user operates on a higher level and defines an ac- 
tive set of compute resources selected from the com- 
pute resources-previously entered into the compute 
resource database and checked for user accessibility. 
Grappa submits jobs to this set of resources and. 



transparent to the user, the number of jobs submitted 
to each site is proportioned to the relative computing 
power of that site. 

A portion of the job submission form is visible in 
Figure H 

3.3. Job Monitoring 

The user monitors Grappa jobs in two ways. Sub- 
mitted jobs appear on a web form. From this, the user 
selects lists of jobs to monitor. Grappa then queries 
the GRAM reporters on the compute sites to obtain 
the status for each job. A screen-shot of the Grappa 
job monitoring form is shown in Figure 13 Links of 
system monitors such as Ganglia were added to the 
portal to provide a second method to monitor Grappa 
jobs and cluster performance. Additionally links to 
the ATLAS repfica catalog system, MAGDA[i3, pro- 
vide the user the ability to browse file locations from 
within the portal. 

3.4. Browsing Results 

Visual results of ATLAS Monte Carlo ntuples were 
generated with PAW. The user enters locations for a 
PAW package and a user PAW macro (kumac) into 
a web form. Grappa then installs and runs PAW on 
the ntuples as they are created. Visual monitors of 
individual jobs are available upon completion of each 
job. In addition a summary plot dynamically updates 
with the results of the entire multi-file dataset as it is 
being generated. The Grappa-produced graphics files 
and log files can be browsed using a GridFTP File 
Manager Jetspeed portlet. 

3.5. Data Challenges 

The Grappa framework is packaged and installed on 
submit hosts with PACMAnQ. As the target ap- 
plication we used the ATLASlq analysis and control 
framework, Athena, configured to run a fast Monte 
Carlo simulation, Atlfast. Athena libraries and PAW 
are available as tar files that Grappa dynamically in- 
stalls on any compute site as needed. 

The portal is designed to submit to Globus- 
supported job-mangers. The U.S. test-bed used for 
Grappa data challenges provided access to about 100 
CPUs across 15 different sites using condor, Isf, pbs, 
and fork Globus job-managers. The U.S. ATLAS 
grid test-bed included contributions from the Univer- 
sities of: Texas (Arlington), Oklahoma, Chicago, Indi- 
ana, Michigan, Wisconsin, Florida, and Boston, plus 
Fermi, Argonne, Berkeley, and Brookhaven National 
Labs. To demonstrate interoperability with a Euro- 
pean resource broker, the portal communicated with 
the resource broker (INFN) using the Globus fork job- 
manager, but the application was packaged to provide 
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Figure 2: A screen-shot showing a portion of the Grappa submission form. The user enters grid locations for input and 
output files, job parameters, etc. 



a Job Description Language ( JDL) script to the broker 
which then resubmits the job to its own resources. 

Grappa performance is limited by the wait time be- 
tween each job submission. The delays in contacting 
remote resources using Grappa/ Java Cog were simi- 
lar to the delays seen using Globus via a Unix Shell or 
Condor- Gfisj. The long wait that occurs when large 
numbers of jobs (greater than 25) resulted in a "busy" 
web form for longer than 10 minutes, making these 
large scale submissions somewhat impractical from an 
interactive web form. This problem was solved by cre- 
ating a non-interactive command line interface to the 
portal using the Cactus^^ toolkit. For large-scale 
long-duration data production, cron is used to submit 
the desired frequency of jobs. This saves the web inter- 
face from long wait times so the status and progress of 
web-submitted and command-line-submitted jobs can 
then be viewed from the web interface whenever the 
user wishes. 



4. Summary 

Grid Portals promise to simplify for the end-user 
access to diverse Grid resources. Grappa provides a 
powerful set of tools allowing development of platform 



independent user-interfaces accessible via standard 
Internet protocols. We have demonstrated with 
ATLAS Monte Carlo data challenges that Grappa 
can be a useful interface for controlling large scale 
data production on the Grid. 
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